I. Data Pre-processing

Load Data

Remove rows with empty Sale Price

II. EDA and Data Cleaning

A) Exploration of correlations

B) Exploration / Visualization

Now we see correlation of 'SalePrice' with few important and relevant features basis business sense

III. Lasso Regression

The dataset entails many features. It is reasonable to think that some of the features are not relevant or not significant and should not be included in our model.

Therefore, it is appropriate to use a Lasso Regression which can set the value of some coefficients to zero and exclude them from the regression.

Encode categorical features

Train test split

Feature scaling (standardization)

Theoretical precisions on feature scaling:

Source of code below; we use RobustScaler because our data contains many outliers. RobustScaler will remove the median and will scale the data according to the quantile range.

Modelization

Model performance

IV. Decision Trees Classifier

The target variable 'SalePrice' has values ranging from 35K to 755K dollars. We divided the range of sale price values into three classes - Low, Medium & High, so it can be used in BAU settings. Further, in order to predict the class in which a particular house property falls in, we will be performing Decision Tree Classification analysis.

As we see from the above, out of 84 properties belonging to the high class group, the Decision Tree Classifier classified 69 of them correctly, with an accuracy of over 82%. Likewise, out of 98 properties belonging to low class group, the Decision Tree Classifier classified 73 of them correctly, with an accuracy of 74%.